Introduction¶

This project uses the NHIS Adult Summary Health Statistics (2019–2023) to examine how healthcare access changed during and after the COVID-19 pandemic.

In this part, different analytical tools were used to leverage their respective strengths.
My teammates focused on Tableau to conduct exploratory data analysis and create high-level visualizations that highlight subgroup differences and overall patterns in healthcare access.

Building on this foundation, the analysis below uses Python and interactive visualization tools (Plotly) to perform more detailed, time-oriented analysis. This section focuses on examining temporal trends, recovery dynamics, and cross-topic comparisons from 2019 to 2023, allowing for deeper investigation into how healthcare access evolved during and after the COVID-19 pandemic.

In [1]:
%%HTML
<script src="require.js"></script>
In [2]:
import plotly.io as pio
pio.renderers.default = "notebook"
In [3]:
import pandas as pd
import plotly.express as px
import plotly.graph_objects as go
from plotly.subplots import make_subplots
In [4]:
df = pd.read_csv("/Users/emmazhou/Desktop/Datathon/Access_to_Care_Dataset.csv")

df.head()
Out[4]:
TOPIC SUBTOPIC SUBTOPIC_ID TAXONOMY TAXONOMY_ID CLASSIFICATION CLASSIFICATION_ID GROUP GROUP_ID GROUP_ORDER ... ESTIMATE_TYPE ESTIMATE_TYPE_ID TIME_PERIOD TIME_PERIOD_ID ESTIMATE STANDARD_ERROR ESTIMATE_LCI ESTIMATE_UCI FLAG FOOTNOTE_ID_LIST
0 Angina/angina pectoris NaN NaN Cardiovascular diseases 60 Total 0 Total 1 1 ... Percent of population, crude 1 2019 NaN 1.7 NaN 1.5 1.9 NaN NT_NHISA00,NT_NHISA999,FN_NHISA18,SC_NHISA00
1 Angina/angina pectoris NaN NaN Cardiovascular diseases 60 Total 0 Total 1 1 ... Percent of population, crude 1 2020 NaN 1.5 NaN 1.3 1.6 NaN NT_NHISA00,NT_NHISA999,FN_NHISA18,SC_NHISA00
2 Angina/angina pectoris NaN NaN Cardiovascular diseases 60 Total 0 Total 1 1 ... Percent of population, crude 1 2021 NaN 1.5 NaN 1.4 1.7 NaN NT_NHISA00,NT_NHISA999,FN_NHISA18,SC_NHISA00
3 Angina/angina pectoris NaN NaN Cardiovascular diseases 60 Total 0 Total 1 1 ... Percent of population, crude 1 2022 NaN 1.6 NaN 1.5 1.8 NaN NT_NHISA00,NT_NHISA999,FN_NHISA18,SC_NHISA00
4 Angina/angina pectoris NaN NaN Cardiovascular diseases 60 Total 0 Total 1 1 ... Percent of population, crude 1 2023 NaN 1.6 NaN 1.4 1.8 NaN NT_NHISA00,NT_NHISA999,FN_NHISA18,SC_NHISA00

5 rows × 25 columns

In [5]:
# Remove rows with FLAG values (data quality issues)
df = df[df["FLAG"].isna()].copy()

# Convert ESTIMATE and TIME_PERIOD columns to numeric, coercing errors to NaN
df["ESTIMATE"] = pd.to_numeric(df["ESTIMATE"], errors="coerce")
df["TIME_PERIOD"] = pd.to_numeric(df["TIME_PERIOD"], errors="coerce")

# Drop rows where ESTIMATE is missing (cannot analyze without estimates)
df = df.dropna(subset=["ESTIMATE"])

# Filter data to focus on the study period (2019-2023)
df = df[df["TIME_PERIOD"].between(2019, 2023)]

Module 1: Overall Trends and Cross-Topic Comparison¶

Questions addressed:

  • Are there trends over time (2019–2023) in delayed care or health outcomes?
  • How do patterns differ across health conditions or topics?

Approach:

  • Compare national trends in cost-related unmet medical care and doctor visit rates to capture system-wide disruption and recovery.
In [6]:
topic_1 = "Did not get needed medical care due to cost"
topic_2 = "Doctor visit among adults"

label_map = {
    topic_1: "Did NOT get needed care due to cost",
    topic_2: "Doctor visit rate"
}

color_map = {
    "Did NOT get needed care due to cost": "firebrick",
    "Doctor visit rate": "steelblue"
}
In [7]:
# Filter the main dataframe to include only national-level data (GROUP == "Total")
# for the two topics being compared: cost-related unmet care and doctor visits
compare_df = df[
    (df["GROUP"] == "Total") &
    (df["TOPIC"].isin([topic_1, topic_2]))
].copy()
In [8]:
from plotly.subplots import make_subplots
import plotly.graph_objects as go

fig = make_subplots(
    rows=1, cols=2,
    subplot_titles=[label_map[topic_1], label_map[topic_2]],
    shared_yaxes=False
)

for i, topic in enumerate([topic_1, topic_2], start=1):
    sub = compare_df[compare_df["TOPIC"] == topic].sort_values("TIME_PERIOD")

    fig.add_trace(
        go.Scatter(
            x=sub["TIME_PERIOD"],
            y=sub["ESTIMATE"],
            mode="lines+markers",
            name=label_map[topic],
            line=dict(color=color_map[label_map[topic]], width=3),
            marker=dict(size=8)
        ),
        row=1, col=i
    )

    # COVID marker
    fig.add_vline(x=2020, line_dash="dash", line_color="gray", row=1, col=i)

# Axis titles
fig.update_yaxes(title_text="Cost-related unmet need (%)", row=1, col=1)
fig.update_yaxes(title_text="Doctor visit rate (%)", row=1, col=2)

# Fix axis ranges to better visualize trends
fig.update_yaxes(range=[5.8, 8.6], row=1, col=1)
fig.update_yaxes(range=[82.0, 85.1], row=1, col=2)

fig.update_layout(
    title="Pandemic Shock: Utilization Collapsed, Cost Barriers Shifted (2019–2023)",
    template="plotly_white",
    hovermode="x unified",
    legend_title="Metric",
    font=dict(size=14),
    margin=dict(t=80)
)

fig.show()

Figure 1: Pandemic Shock: Utilization Collapsed, Cost Barriers Shifted (2019–2023)¶

  • At the national level, cost-related unmet medical care declined sharply at the onset of the COVID-19 pandemic, while doctor visit rates simultaneously collapsed. On the surface, this divergence could be misread as an improvement in affordability.

  • In reality, these opposing trends reveal a deeper disruption: access barriers did not disappear, but became less visible as routine and preventive care was postponed or avoided altogether. When people stop seeking care, they also stop reporting why they could not afford it.

  • This pattern demonstrates a critical limitation of relying on single access indicators. During systemic shocks, reported barriers can move in the “right” direction even as access worsens, highlighting the need to interpret trends within the broader context of healthcare utilization.

Module 2: Education-Based Inequality in Delayed Care¶

Questions addressed:

  • How do barriers differ by demographic categories?
  • Which subgroups experience the highest rates of delayed or unmet care?

Approach:

  • Visualize the gap in cost-related delayed care between low- and high-education groups to assess uneven recovery over time.
In [9]:
import plotly.graph_objects as go

def plot_shaded_gap(
    df,
    topic,
    group,
    low_label,
    high_label,
    title,
    low_color="firebrick",
    high_color="seagreen"
):
    sub_df = df[
        (df["TOPIC"] == topic) &
        (df["GROUP"] == group) &
        (df["SUBGROUP"].isin([low_label, high_label]))
    ].copy()

    low = sub_df[sub_df["SUBGROUP"] == low_label].sort_values("TIME_PERIOD")
    high = sub_df[sub_df["SUBGROUP"] == high_label].sort_values("TIME_PERIOD")

    fig = go.Figure()

    # Lower bound (advantaged group)
    fig.add_trace(
        go.Scatter(
            x=high["TIME_PERIOD"],
            y=high["ESTIMATE"],
            mode="lines+markers",
            name=high_label,
            line=dict(color=high_color, width=3),
            marker=dict(size=7)
        )
    )

    # Upper bound (disadvantaged group) + shaded gap
    fig.add_trace(
        go.Scatter(
            x=low["TIME_PERIOD"],
            y=low["ESTIMATE"],
            mode="lines+markers",
            name=low_label,
            line=dict(color=low_color, width=3),
            marker=dict(size=7),
            fill="tonexty",
            fillcolor="rgba(178,34,34,0.25)"
        )
    )

    fig.add_vline(x=2020, line_dash="dash", line_color="gray")

    fig.update_layout(
        title=title,
        xaxis_title="Year",
        yaxis_title="Percentage of Adults (%)",
        template="plotly_white",
        hovermode="x unified",
        legend_title=group,
        font=dict(size=14)
    )

    return fig
In [10]:
fig_edu = plot_shaded_gap(
    df=df,
    topic="Did not get needed medical care due to cost",
    group="Education",
    low_label="No high school diploma or GED",
    high_label="College degree or higher",
    title="Education Gap in Cost-related Delayed Care (2019–2023)",
    low_color="firebrick",
    high_color="seagreen"
)

fig_edu.show()

Figure 2: Education Gap in Cost-related Delayed Care (2019–2023)¶

Figure 2 illustrates persistent education-based disparities in cost-related delayed medical care over time. Adults without a high school diploma or GED consistently experienced substantially higher rates of unmet medical need due to cost compared to those with a college degree or higher.

From 2019 to 2020, the gap between the two groups narrowed noticeably, reflecting a temporary compression during the initial COVID-19 shock. However, this narrowing does not indicate improved equity. Instead, it coincides with widespread disruptions in healthcare utilization, during which fewer individuals sought care overall.

After 2021, the education gap widened again as healthcare utilization gradually recovered. While cost-related delayed care among highly educated adults remained relatively stable, rates among adults with lower educational attainment rebounded more strongly. This pattern suggests an uneven recovery in which individuals with fewer educational resources faced greater difficulty re-engaging with the healthcare system once normal demand resumed.

Module 3: Poverty-Based Inequality in Delayed Care¶

Questions addressed:

  • How do barriers differ by income-related factors?
  • Which populations remain at higher risk?

Approach:

  • Apply the same gap-based framework to poverty level to examine how financial constraints shape post-pandemic access to care.
In [11]:
fig_poverty = plot_shaded_gap(
    df=df,
    topic="Did not get needed medical care due to cost",
    group="Poverty level",
    low_label="<100% FPL",
    high_label="≥200% FPL",
    title="Poverty Gap in Cost-related Delayed Care (2019–2023)",
    low_color="darkred",
    high_color="steelblue"
)

# Modify the label to be more descriptive
fig_poverty.for_each_trace(
    lambda trace: trace.update(name="Below Poverty Line (<100% FPL)") 
    if trace.name == "<100% FPL" 
    else trace.update(name="Above Poverty Line (≥200% FPL)") 
    if trace.name == "≥200% FPL" 
    else trace
)

fig_poverty.show()

Figure 3: Poverty Gap in Cost-related Delayed Care (2019–2023)¶

Figure 3 shows an even more pronounced disparity when delayed care is examined through the lens of poverty level. Adults living below the federal poverty line (<100% FPL) consistently reported the highest rates of cost-related unmet medical care, while adults at or above 200% of the poverty line reported much lower rates throughout the period.

Similar to the education gap, the poverty-based gap narrowed during the early pandemic period. However, unlike education, the rebound among adults below the poverty line was stronger and more sustained after 2021. By 2023, cost-related delayed care among individuals below the poverty line had risen close to pre-pandemic levels, while rates among higher-income adults remained comparatively low.

This divergence suggests that immediate economic constraints play a particularly strong role in shaping access to care during recovery periods. While the pandemic temporarily suppressed cost barriers by reducing healthcare utilization overall, those barriers resurfaced quickly for individuals in poverty once care-seeking resumed.

Cross-Cutting Insight: Invisible Barriers¶

Question addressed:

  • How do invisible barriers emerge when expected care availability diverges from reported access?

Approach:

  • Jointly analyze healthcare utilization and cost-related access barriers to highlight how crises can temporarily mask structural inequities.

Taken together, these findings reveal how crises can temporarily mask structural inequities in healthcare access. The pandemic did not eliminate cost barriers; it altered who showed up in the data and when those barriers were visible.

When utilization collapsed, disparities appeared to narrow. When utilization returned, inequalities re-emerged—often just as sharply as before. This pattern shows that equity cannot be assessed at a single point in time or through a single metric.

Understanding healthcare access requires a multi-dimensional lens that captures system-wide disruption, subgroup vulnerability, and recovery dynamics together. Without this perspective, the populations most at risk are precisely the ones most likely to disappear from view.